Bug Report Triaging using Textual, Categorical and Contextual Features using Latent DIRICHLET Allocation

نویسندگان

  • Anuradha Sharma
  • Sachin Sharma
چکیده

Software Bugs occur for a wide range of reasons. Bug reports can be generated automatically or drafted by user of software. Bug reports can also go with other malfunctions of the software, mostly for the beta or unsteady versions of the software. Most often, these bug reports are improved with user contributed experiences as to know what in fact faced by him/her. Addressing these bugs accounts for the majority of effort spent in the maintenance phase of a software project life cycle. Most often, several bug reports, sent by different users, match up to the same defect. Nevertheless, every bug report is to be analyzed separately and carefully for the possibility of a potential bug. The person responsible for processing the newly reported bugs, checking for duplicates and passing them to suitable developers to get fixed is called a Triager and this process is called Triaging. The utility of bug tracking systems is hindered by a large number of duplicate bug reports. In many open source software projects, as many as one third of all reports are duplicates. This identification of duplicacy in bug reports is time-taking and adds to the already high cost of software maintenance. In this dissertation, a model of automated triaging process is proposed based on textual, categorical and contextual similarity features. The contribution of this dissertation is twofold. In the proposed scheme a total of 80 textual features are extracted from the bug reports. Moreover, topics are modeled from the complete set of text corpus using Latent Dirichlet Allocation (LDA). These topics are specific to the category, class or functionality of the software. For e.g., possible list of topics for android bug repository might be Bluetooth, Download, Network etc. Bug reports are analyzed for context, to relate them to the domain specific topics of the software, thereby; enhancing the feature set which is used for tabulating similarity score. Finally, two sets are made for duplicates and non-duplicate bug reports for binary classification using Support Vector Machine. Simulation is performed over a dataset of Bugzilla. The proposed system improves the efficiency of duplicacy checking by 15 % as compared to the contextual model proposed by Anahita Alipour et.al. The system is able to reduce development cost by improvising the duplicity checking while allowing at least one bug report for each real defect to reach developers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Team Allocation Technique Ensuring Bug Assignment to Existing and New Developers Using Their Recency and Expertise

Existing techniques allocate a bug fixing team using only previous fixed bug reports. Therefore, these techniques may lead to inactive team member allocation as well as fail to include new developers in the suggested list. A Team Allocation approach for ensuring bug assignment to both Existing and New developers (TAEN) is proposed, which uses expertise and recent activities of developers. TAEN ...

متن کامل

En-LDA: An Novel Approach to Automatic Bug Report Assignment with Entropy Optimized Latent Dirichlet Allocation

With the increasing number of bug reports coming into the open bug repository, it is impossible to triage bug reports manually by software managers. This paper proposes a novel approach called En-LDA (Entropy optimized Latent Dirichlet Allocation (LDA)) for automatic bug report assignment. Specifically, we propose entropy to optimize the number of topics of the LDA model and further use the ent...

متن کامل

Applying LDA in Contextual Image Retrieval - ReDCAD participation at ImageCLEF Flickr Photo Retrieval 2012

This paper describes our participation in photo Flickr retrieval task at the ImageCLEF 2012 Campaign. Our aim is to evaluate the performance of topic models, such as Latent Dirichlet Allocation (LDA), in image retrieval based on the textual information surrounding the images. To do this, we propose to extract topics from Flickr user tags using the LDA topic model. Then, we use the Jensen-Shanno...

متن کامل

Semi-Automatic Construction of a Textual Entailment Dataset: Selecting Candidates with Vector Space Models

Recognizing Textual Entailment (RTE) is an NLP task aimed at detecting whether the meaning of a given piece of text entails the meaning of another one. Despite its relevance to many NLP areas, it has been scarcely explored in Portuguese, mainly due to the lack of labeled data. A dataset for RTE must contain both positive and negative examples of entailment, and neither should be obvious: negati...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015